GH-126910: Make _Py_get_machine_stack_pointer return the actual stack pointer#149103
GH-126910: Make _Py_get_machine_stack_pointer return the actual stack pointer#149103markshannon wants to merge 4 commits intopython:mainfrom
_Py_get_machine_stack_pointer return the actual stack pointer#149103Conversation
…ething close it), but not the frame pointer * Make _Py_ReachedRecursionLimit inline again * Remove _Py_MakeRecCheck replacing its use with _Py_ReachedRecursionLimit * Move the check for C stack swtiching into _Py_CheckRecursiveCall
Documentation build overview
6 files changed ·
|
|
🤖 New build scheduled with the buildbot fleet by @markshannon for commit 01fe604 🤖 Results will be shown at: https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F149103%2Fmerge If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again. |
|
🤖 New build scheduled with the buildbot fleet by @markshannon for commit 98073f5 🤖 Results will be shown at: https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F149103%2Fmerge If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again. |
|
@pablogsal |
Perf should add the Python function below that one, yes. I need time to investigate I will try to do it this week but I am a bit overwhelmed with pending things. I will try to take a look as soon as possible. |
I spent several hours digging into this one and was able to reproduce theCentOS9 NoGIL failure locally in a CentOS Stream 9 podman container with GCC 11.5 and perf 5.14. The short version is: the PR itself did not break perf-map generation. The generated perf map still contains the On the failing build, __builtin_frame_address(0)After this PR, on x86-64 it reads the real stack pointer with inline asm: __asm__("{movq %%rsp, %0" : "=r" (result));The new behavior is the right semantic direction for the stack-pointer work, but the old That matters because the perf trampoline test depends on frame-pointer unwinding. On the failing CentOS9/GCC 11.5 build, __attribute__((optimize ("no-tree-slp-vectorize")))With the new _PyEval_EvalFrameDefault:
push %r15
push %r14
push %r13
push %r12
push %rbx
sub $0x180,%rsp
...
mov %rsp,%raxWith the old _PyEval_EvalFrameDefault:
push %rbp
mov %rsp,%rbp
push %r15
...So before this PR, the test passed because The fix is to keep the new real stack-pointer behavior, but make the eval-loop optimization attribute explicitly preserve frame pointers too: #define DONT_SLP_VECTORIZE \
__attribute__((optimize ("no-tree-slp-vectorize", "no-omit-frame-pointer")))I verified this in the CentOS Stream 9 reproduction:
push %rbp
mov %rsp,%rbp |
(or something close it), but not the frame pointer
This is a rebase of #147945 which was reverted due to a non-reproducable buildbot failure.